710 research outputs found

    The Exploitation of Web Navigation Data: Ethical Issues and Alternative Scenarios

    Get PDF
    Nowadays, the users' browsing activity on the Internet is not completely private due to many entities that collect and use such data, either for legitimate or illegal goals. The implications are serious, from a person who exposes unconsciously his private information to an unknown third party entity, to a company that is unable to control its information to the outside world. As a result, users have lost control over their private data in the Internet. In this paper, we present the entities involved in users' data collection and usage. Then, we highlight what are the ethical issues that arise for users, companies, scientists and governments. Finally, we present some alternative scenarios and suggestions for the entities to address such ethical issues.Comment: 11 pages, 1 figur

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Realistic testing of RTC applications under mobile networks

    Get PDF
    The increasing usage of Real-Time Communication (RTC) applications for leisure and remote working calls for realistic and reproducible techniques to test them. They are used under very different network conditions: from high-speed broadband networks, to noisy wireless links. As such, it is of paramount importance to assess the impact of the network on users’ Quality of Experience (QoE), especially when it comes to the application’s mechanisms such as video quality adjustment or transmission of redundant data. In this work, we pose the basis for a system in which a target RTC application is tested in an emulated mobile environment. To this end, we leverage ERRANT, a data-driven emulator which includes 32 distinct profiles modeling mobile network performance in different conditions. As a use case, we opt for Cisco Webex, a popular RTC application. We show how variable network conditions impact the packet loss, and, in turn, trigger video quality adjustments, impairing the users’ QoE

    Sodium hydroxide pretreatment as an effective approach to reduce the dye/holes recombination reaction in P-Type DSCs

    Get PDF
    We report the synthesis of a novel squaraine dye (VG21-C12) and investigate its behavior as p-type sensitizer for p-type Dye-Sensitized Solar Cells. The results are compared with O4-C12, a well-known sensitizer for p-DSC, and sodium hydroxide pretreatment is described as an effective approach to reduce the dye/holes recombination. Various variable investigation such as dipping time, dye loading, photocurrent, and resulting cell efficiency are also reported. Electrochemical impedance spectroscopy (EIS) was utilized for investigating charge transport properties of the different photoelectrodes and the recombination phenomena that occur at the (un)modified electrode/electrolyte interface

    What Scanners do at L7? Exploring Horizontal Honeypots for Security Monitoring

    Get PDF
    Honeypots are a common means to collect data useful for threat intelligence. Most efforts in this area rely on vertical systems and target a specific scenario or service to analyse data collected in such deployment. We here extend the analysis of the visibility of honeypots, by revisiting the problem from a horizontal perspective. We deploy a flexible honeypot system hosting multiple services, relying on the T-Pot project. We collect data for 5 months, recording millions of application requests from tens of thousands of sources. We compare if and how the attackers interact with multiple services. We observe attackers that always focus on one or few services, and others that target tens of services simultaneously. We dig further into the dataset, providing an initial horizontal analysis of brute-force attacks against multiple services. We show, for example, clear groups of attackers that rely on different password lists on different services. All in all, this work is our initial effort to build a horizontal system that can provide insights on attacks

    GLEm-Net: Unified Framework for Data Reduction with Categorical and Numerical Features

    Get PDF
    In the era of Big Data, effective data reduction through feature selection is of paramount importance for machine learning. This paper presents GLEm-Net (Grouped Lasso with Embeddings Network), a novel neural framework that seamlessly processes both categorical and numerical features to reduce the dimensionality of data while retaining as much information as possible. By integrating embedding layers, GLEm-Net effectively manages categorical features with high cardinality and compresses their information in a less dimensional space. By using a grouped Lasso penalty function in its architecture, GLEm-Net simultaneously processes categorical and numerical data, efficiently reducing high-dimensional data while preserving the essential information. We test GLEm-Net with a real-world application in an industrial environment where 6 million records exist and each is described by a mixture of 19 numerical and 7 categorical features with a strong class imbalance. A comparative analysis using state-of-the-art methods shows that despite the difficulty of building a high-performance model, GLEm-Net outperforms the other methods in both feature selection and classification, with a better balance in the selection of both numerical and categorical features

    Measuring HTTP/3: Adoption and Performance

    Get PDF
    The third version of the Hypertext Transfer Protocol (HTTP) is currently in its final standardization phase by the IETF. Besides better security and increased flexibility, it promises benefits in terms of performance. HTTP/3 adopts a more efficient header compression schema and replaces TCP with QUIC, a transport protocol carried over UDP, originally proposed by Google and currently under standardization too. Although HTTP/3 early implementations already exist and some websites announce its support, it has been subject to few studies. In this work, we provide a first measurement study on HTTP/3. We testify how, during 2020, it has been adopted by some of the leading Internet companies such as Google, Facebook and Cloudflare. We run a large-scale measurement campaign toward thousands of websites adopting HTTP/3, aiming at understanding to what extent it achieves better performance than HTTP/2. We find that adopting websites often host most web page objects on third-party servers, which support only HTTP/2 or even HTTP/1.1. Our experiments show that HTTP/3 provides sizable benefits only in scenarios with high latency or very poor bandwidth. Despite the adoption of QUIC, we do not find benefits in case of high packet loss, but we observe large diversity across website providers' infrastructures

    Free Floating Electric Car Sharing: A Data Driven Approach for System Design

    Get PDF
    In this paper, we study the design of a free floating car sharing system based on electric vehicles. We rely on data about millions of rentals of a free floating car sharing operator based on internal combustion engine cars that we recorded in four cities. We characterize the nature of rentals, highlighting the non-stationary, and highly dynamic nature of usage patterns. Building on this data, we develop a discrete-event trace-driven simulator to study the usage of a hypothetical electric car sharing system. We use it to study the charging station placement problem, modeling different return policies, car battery charge and discharge due to trips, and the stochastic behavior of customers for plugging a car to a pole. Our data-driven approach helps car sharing providers to gauge the impact of different design solutions. Our simulations show that it is preferred to place charging stations within popular parking areas where cars are parked for short time (e.g., downtown). By smartly placing charging stations in just 8% of city zones, no trip ends with a discharged battery, i.e., all trips are feasible. Customers shall collaborate by bringing the car to a charging station when the battery level goes below a minimum threshold. This may reroute the customer to a different destination zone than the desired one; however, this happens in less than 10% of all trips

    E-Scooter Sharing: Leveraging Open Data for System Design

    Get PDF
    With the shift toward a Mobility-as-a-Service paradigm, electric scooter sharing systems are becoming a popular transportation mean in cities. Given their novelty, we lack of consolidated approaches to study and compare different system design options. In this work, we propose a simulation approach that leverages open data to create a demand model that captures and generalises the usage of this transportation mean in a city. This calls for ingenuity to deal with coarse open data granularity. In particular, we create a flexible, data-driven demand model by using modulated Poisson processes for temporal estimation, and Kernel Density Estimation (KDE) for spatial estimation. We next use this demand model alongside a configurable e-scooter sharing simulator to compare performance of different electric scooter sharing design options, such as the impact of the number of scooters and the cost of managing their charging. We focus on the municipalities of Minneapolis and Louisville which provide large scale open data about e-scooter sharing rides. Our approach let researchers, municipalities and scooter sharing providers to follow a data driven approach to compare and improve the design of e-scooter sharing system in smart cities
    • …
    corecore